Discriminative weighting of multi-resolution sub-band cepstral features for speech recognition
نویسندگان
چکیده
This paper explores possible strategies for the recombination of independent multi-resolution sub-band based recognisers. The multi-resolution approach is based on the premise that additional cues for phonetic discrimination may exist in the spectral correlates of a particular sub-band, but not in another. Weights are derived via discriminative training using the ‘Minimum Classification Error’ (MCE) criterion on loglikelihood scores. Using this criterion the weights for correct and competing classes are adjusted in opposite directions, thus conveying the sense of enforcing separation of confusable classes. Discriminative re-combination is shown to provide significant increases for both phone classification and continuous recognition tasks on the TIMIT database. Weighted recombination of independent multi-resolution subband models is also shown to provide robustness improvements in broadband noise.
منابع مشابه
Multi-resolution cepstral features for phoneme recognition across speech sub-bands
Multi-resolution sub-band cepstral features strive to exploit discriminative cues in localised regions of the spectral domain by supplementing the full bandwith cepstral features with subband cepstral features derived from several levels of sub-band decomposition. Mult-iresolution feature vectors, formed by concatenation of the subband cepstral features into an extended feature vector, are show...
متن کاملMaximum likelihood sub-band weighting for robust speech recognition
Sub-band speech recognition approaches have been proposed for robust speech recognition, where full-band power spectra are divided into several sub-bands and then likelihoods or cepstral vectors of the sub-bands are merged depending on their reliability. In conventional sub-band approaches, correlations across the sub-bands are not modeled and the merging weights can only be set experientially ...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کاملMulti resolution discriminative models for subvocalic speech recognition
In this work, we investigate the use of discriminative models for automatic speech recognition of subvocalic speech via surface electromyography (sEMG). We also investigate the suitability of multiresolution analysis in the form of discrete wavelet transform (DWT) for sEMG-based speech recognition. We examine appropriate dimensionality reduction techniques for features extracted using different...
متن کاملA multi-band approach based on the probabilistic union model and frequency-filtering features for robust speech recognition
Multi-band approach has recently been introduced for recognition of speech corrupted by frequency-localized noise, showing higher robustness than the traditional full-band approach. However, the multi-band approach has been found to be less robust for wide-band noise than the full-band approach. In this paper, we present a multi-band recognition system based on the combination of the probabilis...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998